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MULTI-PRECISION TECHNIQUE FOR DIGITAL AUDIO ENCODER 



Field of thg Invention 

This invention is applicable in the field of audio encoders, and in panicular co those audio 
■ encoders winch may be implemented on fixed point arithmetic digitai processors, such as for 
professional and commercial applications. 

Background of the Invention 

In order to more efficiently broadcast or record audio signals, the amount of information 
required to represent the audio signals may be reduced. In the case of digital audio signals, 
the amount of digital miormation needed to accurately reproduce the original pulse code 
modulation (PCM) samples may be reduced by applying a digital compression algorithm, 
resuitmg in a digitally compressed representadon of the original signal. The goal of the 
digital compression algondim is to produce a digital representation of an audio signal which, 
when decoded and reproduced, sounds the same as die original signal, while using a minimum 
of digital informanon for the compressed or encoded representation. 

Recent advances in audio coding technology have led to high compression ratios while 
keeping audible degradation in the comipressed signal to a minimum. These coders are 
intended for a variety of appiications, including 5. 1 channel film soundtracks, HDTV, laser 
discs and multimedia. Description of one applicable mediod can be found in the Advanced 
Television Systems Committee (ATSC) Standard document entitled "Digital Audio 
Compression (AC-3) Standard", Document A/52, 20 December, 1995, and the disclosure of 
that document is hereby expressly incorporated herein by reference. 

The implementation of an AC-3 encoder by translation of the requirements and processes 
from the abovementioned AC-3 Standard onto the firmware of a Digital Signal Processor 
(DSP) core involves several phases. Firsdy, the essential compression algorithm blocks of 
the AC-3 Encoder have to be designed, since it is only the functions which are defined by the 
standard. Alter individual blocks are completed, they are integrated into an encoding system 
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which receives a PCM (pulse code modulated) stream, processes the signal applying signal 
processing techniques such as transient detection, frequency transtormation, masking and 
psychoacoustic analysis, and produces a compressed stream in the format of the AC-3 
Standard. 

5 

The coded AC-3 stream should be capable of being decompressed by any standard AC-3 
Decoder and the PCM stream generated thereby should be comparable in audio quality to the 
original music stream. If the original stream and the decompressed stream are 
indistinguishable in audible quality (at reasonable level of compression) the development 
0 moves to die third phase. If the quality is not transparent (indistinguishable), further 
algorithmic development and improvements continue. 

In the third phase the algorithms are simulated in a high level language (e.g. C) using the 
word-length speciflcaaons of the target DSP-Core. Most commercial DSP-Cores allow only 

5 fixed point anthmeuc (since a floating point engine is cosdy in terms of integrated circuit 
area). Consequently, the encoder algorithms are translated to a fixed point solution. The 
word-length used is u.sually dictated by the ALU (arithmetic-logic imit) capabilities and bus- 
width of the target core. For example, an AC-3 encoder on a Motorola 56000 DSP would 
use 24-bit precision since it is a 24-bit Core. Similarly, for implementation on a Zoran 

0 ZR3800O which has a 20-bit data path, 20-bit precision would be used. 

If, for example, 20-bit precision is discovered to provide an unacceptable level of sound 
quality, the provision to use double precision always exists. In this case each piece of data 
is stored and processed as two segments, lower and upper words, each of 20-bit length. The 
5 accuracy of implementation is doubled but so is the computational complexity, and double 
precision multiplicauon could require 6 or more cycles where a single precision multiplication 
and addition (MAC) may use only a single cycle. Block exponent and other boosung 
techniques specific to the AC-3 encoder can be judiciously used to improve the quality, but 
these features are not always found on the commercial DSPs. 
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Single precision 24-bit AC-3 encoders are known to provide sufficient qualit}'. However 
16-bit single precision AC-3 encoder quality is considered very poor. Consequently, the 
implementation of AC-3 encoders on 16-bit DSP cores has not been popular. Since a 
single precision 16-bit implementation of an AC-3 encoder results in unacceptable in 
5 reproduction quality, such a product would be at a distinct disadvantage in the consumer 
market. On the other hand, double precision implementation is too compuiaLionally 
expensive. It has been estimated that a fully double precision implementation would 
require over 140 MIPS (million instruction per second). Tnis exceeds what most 
commercial DSPs can provide, and moreover, extra MPS are always needed for system 
10 soft\vare and value-added features. 

Summary of the havention 

hi accordance with the present invention, there is provided a method for coding digital 
audio data with a transform encoding process implemented on a fixed point digital signal 
processor having multiple levels of computation precision, wherein the transform encoding 
15 process includes a plurahty of computation stages involving arithmetic operations in 
transforming the digital audio data into coded audio data, and wherein different ones of the 
computation stages utihse different preselected levels of computational precision, 
characterised in that: 

the transform encoding process is in accordance with AC-3 Digital Audio 
20 Compression Standard. 

The present invention also provides a digital audio tiransform encoder for coding digital 
audio data into compressed audio data, comprising a fixed point digital signal processor 
having multiple levels of computation precision, and transform encoding process code 
stored in firmware or software for controlling the digital signal processor, wherein the 
25 transform encoding process code includes a plurality of computation blocks kivolving 
arithmetic operations in transforming the digital audio data into compressed audio data, 
and wherein different ones of the computation blocks are performed by the digital signal 
processor using different preselected levels of computational precision, characterised in 
that: 
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the transform encoding process code is in accordance with AC-3 Digital Audio 
Compression Standard. 

In a preferred form, of the invention, the audio transform encoding system is implemented 
on a 16-bit digital signal processor which is capable of single (16-bit) precision 
computations and double (32 -bit) computations. Accordingly, the preferred IS-bit 
implementation uses 
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combinations of single and double precision to best match the reference floating point model. 
Thereby, computanonal compiexiry is reduced without sacrificing quality excessively. The 
features of the preferred embodiment which are discussed in general terms below are thus 
presented in the context of such an implementation. 

5 

For transient detection, single precision (16-bit) calculations can be used. The input stream 
of PCM audio data is assumed to be 16 bits (else it is truncated to 16 bits for this stage) and 
the high pass filter coefficients are restricted to 16-bits as well. The filtered 16-bit data is 
segmented and analysed to detect transients. Simulation results with music streams indicate 
10 that the result of this implementation matches over 99% of the time with the tloatms point 
version. Since this step invoives only 16-bit operations it is termed as 16-16 (data:coefficient) 
processing. 

The input 16-bit PCM is transformed to the frequency domain by first applying a window 
15 with 32-bit length coefficients. Therefore windowing is 16-32 (data:coefficient) processing. 
If the input PCM is 24-bit, Chen 32-16 processing for windowing may be used wherein the 
PCM data is treated as 32-bit (upper fairs sign extended) and is multiplied by 16-bit window 
coefficients. 

20 Frequency Transformation using Modified Discrete Cosine Transform (MDCT) is performed 
using 32-bit data and 16-bit coefficients. For each calculation, the input data is 32-bit and 
is multiplied by the coefficients (sine and cosine terms) which are 16-bit in length. The 
resulting 48-bit is truncated to 32-bit for the next step of processing. This form of frequency 
transformation with 32-16 processing can be shown to give 21-25 bit accuracy with 80% 

25 confidence, when compared with the floating point version. 

Each 32-bit frequency coefficient is assumed to be stored in two 16-bit registers. For phase 
and couplmg strategy calcuiations the upper 16-bit of the data can be utilised. Once the 
strategy for combining the coupled channel to form the coupling channel is known, the 
30 combining process uses the hill 32-bit data. The computation is reduced while the accuracy 
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is sti!! high. Simple truncation of ±e upper 16-bit of the 32-bit data for the phase and 
coupling strategy caicuiation leads to poor result (only 80% of the time the strategy matches 
with that from the floating pomt version), and thus a biock exponent pre-processing meLhod 
can be employed. If the block exponent method is used the coupling strategy is 97% of the 
5 time exactly same as the floatmg point. 

A remainxing decision determmes whether to code coefficients as left (L) and right channel 
(R), or the sum (L + R) and difference (L - R) of the channels, and can be made using the 
upper 16-bit of die 32-bit da t a. The actual rematrix coding of coefficients preferably uses the 
10 full 32-bit data as in the coupling calculations. 

The remaining processing of the AC-3 encoding, including e.xponeni coding, quantization and 
bit aliocauon are defined as fixed pomt arithmeuc in the AC-3 Standard and therefore word- 
length choices are not encountered in ihess calculations. 

15 

Brief Descriprion of the Drawing . s 

The invention is described in greater detail hereinafter, by way of example only, with 
reference to the accompanying drawings, wherein: 

Figure 1 is a system block diagram of an AC-3 compliant audio encoder; 
20 Figure 2 is a comparison of 24-24 (data; coefficient) bit and 16-16 (data; coefficient) bit 
wordlengths with floating point calculations for transient detection; 

Figure 3 is a flow diagram of a transient detecuon process, wherein 16-32 (data- 
coefficient) bit precision is used for windowing operations while 32-16 (data-coefficient) is 
used for frequency transformadon; 
25 Figure 4 shows comparative charts of error probability of fixed point (32-16 & 24-24), 
with the floating-point calculation as reference, for the frequency transformation stage; 

Figure 5 is a block diagram illustrating coupling coefficient generation and phase 
estimation; 

Figure 6 is a diagram illustrating block e.xponent processmg; 
30 Figures 7, 8 and 9 are frequency response chans of AC-3 encoder implementation in terms 
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of signal-to-noise ratio for floating point, 16-32 bit and 24 bit calculations, respectively. 
Detailed Description of the Preferred Embodimeni5 

In the following detailed descripnon of the preferred embodiments of the inveniion. firstly 
5 a system-leve! descripnon of an AC-3 encoder is provided. This serves to explain the overall 
processes and describe the significance of each processing block in the overall audio 
compression system. 

After the system level description, the word-length requirements of each processing blocks. 

10 where fixed pome arithmetic is used, is discussed. This includes the transient-detection, 
frequency transformanon, rematrixing and coupling blocks. By analysis of data gathered 
through extensive simulation, and statistics derived thence, appropriate word-length 
requirements for each block are then estimated. In panicuiar, ihis description deals with the 
issue of the implementation of the dual-channel AC-3 encoder on a 16-bu processor in a 

1 5 manner such that the processing requirement is not prohibitive and the quality is comparable 
CO implementation on single precision 24-bit processor. 

System Overview 

Like the AC-2 single channel coding technology from which it is derived, an AC-3 audio 
20 coder is fundamentally an adapdve transform-based coder using a frequency-iinear, critically 
sampled filter bank based on the Princen Bradley Time Domain Aliasing Cancellation 
(TDAC) technique. An overall system block diagram of an AC-3 coder 10 is shown in 
Figure I. It may be noted that, of the blocks shown in Figure 1, blocks such as the Frame 
Opumisation Tables 22, Fast Bit Allocation 21 and Spectra! Reshaping 18 are not directly pan 
25 of the AC-3 Standard but are desirable for high quality audio reproduction and for reducing 
the computauonal burden. 

Audio Input Format 

AC-3 is a block structured coder, so one or more blocks of time domain signal, typically 512 
30 samples per block and channel, are collected in an input buffer before proceeding with 
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additionai processing. 
Transient Detection 

Transients are detected in the ftiil-bandwidth channels in order to decide when to switch to 
5 shon length audio blocks for restncting quantization noise associated with the transient within 
a small temporal region about the transient. The input audio signals are high-pass fihered 
(12). and then examined by a transient detector (13) for an mcrease m energy from one sub- 
block time segment to the next. Sub-blocks are examined at different time scales. If a 
transient is detected in the second half of an audio block in a channel, that channel switches 
10 to a shon biock (256 samples). In presence of a transient the bit 'blksiv' for die channel in 
the encoded bit stream in the particular audio block is set. 

The transient detector operates on 512 samples for every audio block- This is done in two 
passes, with each pass processing 256 samples. Transient detection is broken down into four 
1 5 steps: 

1 . high pass filtering; 

2. segmentation of the block into sub-multipies; 

3. peak amplitude detection within each sub-block segment; and 

4. threshold comparison. 

20 

The transient detector outputs the flag blksw for each full-bandwiddi channel, which when set 
to 'one" indicates die presence of a transient in the second half of the 572 length input block 
for the corresponding channel. The four stages of the transient detection are described in 
further detail below. 

25 

1) High pass filtering: The high-pass filter can be implemented as a cascade biquad direct 
form II IIR filter with a cut-off of 8 kHz. 

2) Block Segmentation. The block of 256 htgh-pass filtered samples are segmented into a 
30 hierarchical tree of levels m which level I represents the 256 length block, level 2 is two 
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segments of length 128. and level 3 is four segments of length 64. 

3) Peak Detection; The sample with the largest magnitude is identified for each segment on 
every level of hierarchical tree. The peaks for a single level are found as follows: 

5 

PfiJOJ = max(x(nj) 

forn = (512 X (k-l)/2'J) 4-7 (512xk/2'j) -7 

andk = I, , 2'(J-0: 

where x(n} = the nth sample in the 256 length block 

10 J = 1, 2, 3 IS the hierarchical level number 

k = the segment number within level j 

4) Threshold companson: The tlrsi stage of the threshold comparator checks to see if mere 
is significant signal level m the current block. This is done by comparing the overall peak 

15 value P[l][l] of the current block to a "silence threshold". If PflJflJ is below this threshold 
dien a long block is forced. The silence threshold value is 100/32768. The next stage of the 
comparator checks the relative peak levels of adjacent segments on each level of the 
hierarchical tree. If the peak rauo of any rwo adjacent segments on a particular level exceeds 
a pre-defined threshold for that level, then a flag is set to indicate die presence of a transient 

20 in the current 256 length block. 

Time Domain Aliasing Cancellation (TDAC) Filter Bank 

The time domain input signal for each channels is individually windowed and filtered with 
a TDAC-based analysis filter bank (11) to generate frequency domain coefficients. If the 
25 blksw bit is set, meaning that a transient was detected for the block, then two short transforms 
of length 256 each are taken, which increases the temporal resolution of the signal. It' blksw 
IS not set. a single long transform of length 512 is taken, thereby providing a high spectral 
resolution. 

30 The output frequency sequence [kj is defined as : 
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a; - X] x[n] » cos(27r« (2;7-i) « (2;t+ I )/4A^-Tr - (2;c+l)/4) A-=0 ..OV/2- 1 ) 

where x[n] is the windowed input sequence for a channel and N is the transform length. 

Instead of evaluating Xy. in die form given above it can be computed in a computadonaiiy 
efficient manner as described in the specitlcadon of International Patent Application No. 
5 PCT/SG98/C001fl entitled "A Fast Frequency Transformation Technique for Transform Audio 
Coders" The disclosure of that document is hereby expressly mcorporated herein by 
reference, and an expianatory extract is presented below: 

Instead of evaluating in the form given above it could be computed as 

X = cosy«(g,.^^cos(7r(;t^l/2)//V)-gi- sin(7t(;t-l/2)//V)) 
-siny *(5-^_^sin(Tr(;c+ I/2)//V)-^°'^.,cos(7t:(^+ i/2)/A0) 
Sicr'oki ^ ^{set of real numbers) 

10 where C,. = g.^^ ^ Jgic.. = H (x[n]e;^"'"^)*e^-'"'^^ . The symbol j represents the 

imaginary number . The expression YL (4"]^^"""^)*^^*'"'^^ is obtained from the 

well known FFT mediod. by first using transformauon x' [n] =.r[n] * e^''^"^ and then computing 
theFFT G,. = .r ^e^-""*'^ 

Coupling 

15 High compression can be achieved in AC-3 by use of a technique known as coupling. 
Coupling takes advantage of the way the human ear determmes directionality for high 
frequency signals. At high audio frequency (approximately above 4KHz). the ear is 
physically unable to detect individual cycles of an audio waveform and instead responds to 
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the envelope of the waveform. Consequently, ihe encoder !0 may include a coupling 
processor (14) which combines the high frequency coefficients of the mdivjdual channels to 
form a common coupling channel. The original channels combined to form the couoiing 
channel are called the coupled channeis. 

5 

The most basic encoder can form the couphng channel by simply taking the average of ail the 
individual channel coefficients. A more sophisticated encoder couid alter the signs of the 
individual channels before adding them into the sum to avoid phase cancellation. 

10 The generated coupling channel is sectioned into a number of bands. For each such band and 
each coupling channel a coupling co-ordinate is transmitted to the decoder. To obtain the 
high frequency coefficients in any band, for a panicular coupled channel, from the coupling 
channel, the decoder mukiplies the coupling channel coefficients in that frequency band by 
the couphng co-ordinate of that charjiei for that particular frequency band. For a dual 

1 5 channel encoder a phase correction information is also sent for each frequency band of the 
coupling channel. 

Superior methods of coupling channel formation are discussed in the specification of 
InternauonaJ Patent Applications PCT/SG97/00076, entided "Method and Apparatus for 
20 Estimation of Coupling Parameters in a Transform Coder for High Quality Audio" , and 
PCT/SG97/00075 enutled " Method and .Apparatus for Phase Estimation in a Transform Coder 
for High Quality Audio" . The disclosures of those specifications are hereby expressly 
incorporated herein by reference. An explanatory extract from the latter specification is 
presented beiow for reference. 

25 

"Assume that the frequency domain coefficiems are identified as: 

a, , for the first coupled channel , 

b, , for the second coupled channel , 

c, , for the coupling channel . 

30 For each sub-band, the value £,a*b, is computed, index i extending over the frequency 
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range of the sub-band. If Tfl*b,>0, coupling for this sub-band is performed as 
c^ = (a,-rbJ/2. Similarly, if ]ffl*bi<0, then coupling strategy for the sub-band is 
performed as €, = (0,-- bJ/2. 

5 Adjacent sub-bands using identical coupling strategies may be grouped together to form 
one or more coupling bands. However, sub-bands with different coupling strategies must 
not be banded together, [f overall coupling strategy for a band is c^ — (a,-r-b,)/2, i.e. for 
all sub-bands comprising the band the phase flag for the band is set to -ri, else a is set 10 
-I." 

0 

Rematrixing 

An additional process, rematrixing (15), is invoked in the special case that the encoder is 
processing two channels only. The sum and difference of the two signals from each channel 
are calculated on a band by band basis, and if, m a given band, the level disparity between 
5 the derived (matrixed) signal pair is greater than the corresponding level of the original 
signal, the matrix pair is chosen instead. More bits are provided in the bit stream to indicate 
this condition, in response to which the decoder performs a complementary unmatrixing 
operauon to restore the original signals. The rematrix bits are omined if the coded channels 
are more than two. 

0 

The benefit of this technique is that it avoids directional unmasking if the decoded signals are 
subsequendy processed by a matrix surround processor, such a Dolby Prologic (TM) decoder. 

In AC-3, rematrixing is pertbrmed independendy in separate frequency bands. There are four 
5 bands with boundary locations dependent on the coupling information. The boundary 
locations are by coefficieni bin number, and the corresponding rematrixing band frequency 
boundaries change with sampling frequency. 

Conversion to Floating Point 
0 The transtormed values, which may have undergone rem.atrix and coupling process, are 
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converted to a specific tloanng point represeniation at the exponent extraction block (16). 
resulting in separate arrays of binary exponents and mannssas. This rloating point 
arrangement is maintained through out the remainder of the coding process, until just prior 
to the decoder's inverse transform, and provides H4 dB dynamic range, as well as allows 
5 AC-3 to be implemented on ei±er fixed or floaung point hardware. 

Coded audio information consists essentially of separate representation of the exponent and 
manussa arrays. The remaining coding process focuses individually on reducing the exponent 
and mantissa data rate. 

10 

The exponents are coded using one of the exponent coding strategies. Each mantissa is 
truncated to a fixed number of binary places. The number of bits to be used for coding each 
mantissa is to be obtained from a bn allocation algorithm which is based on the masking 
propeny of the human auditory system. 

]5 

Exponent Coding Strategy 

Exponent values in AC-3 are allowed to range from 0 to -24. The exponent acts as a scale 
factor for each mannssa, equal to 2"^ . Exponents for coefficients which have more dian 24 
leading zeros are fixed at -24 and the corresponding mantissas are allowed to have leading 
20 zeros. 

AC-3 bit stream contains exponents for independent, coupled and the coupling channels. 
Exponent informauon may be shared across blocks within a frame, so blocks 1 through 5 may 
reuse exponents from previous blocks. 

25 

AC-3 e.xponent transmission employs differendal coding technique, in which the exponents 
tor a channel are dirferentially coded across frequency. The first exponent is always sent as 
an absolute value The value indicates the number of leading zeros of the first transform 
coefficient. Successive exponents are sent as differential values which must be added to the 
30 prior exponent value to form the next actual exponent value. 



c 



c 
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The dttferendai encoded exponents are next combined into groups. The grouping is done by- 
one of the three methods; DI5. DZ5 and D45. These together with 'reuse' are referred to as 
exponent strategies. The number of exponents in each group depends only on the exponent 
strategy. In the D15 mode, each group is formed from three exponents. In D45 four 
5 exponents are represented by one differential vaiue. Next, three consecutive such 
representanve differenual vaiues are grouped together to form one group. Each group aiways 
comprises of 7 bits. In case the strategy is 'reuse' for a channel in a block, then no exponents 
are sent for that channel and the decoder reuses the exponents last sent for this channel. 

10 Pre-processing of exponents pnor to coding can iead to better audio qualiry. One such 
processing technique is described in the specification of International Patent Application 
PCT/SG98/0002 entuied "Meihod and Apparatus for Spectral Exponent Reshaping in a 
Transform Coder for High Quaiity Audio", the disclosure of which is incorporated herein by 
reference. 

15 

Choice of the suitable strategy for exponent coding forms an important aspect of AC-3, and 
in the encoder 10 shown in Figure I is performed by the process blocks 17, 18. D15 
provides the highest accuracy but is low in compression. On die other hand transmitung only 
one exponent set for a channel in the frame (in the first audio block of the frame) and 
20 attempting to 'reuse' the same exponents for the next five audio block, can lead to high 
exponent compression but also sometimes very audible distortion. 

Several methods exist for determination of exponent strategy, and one such method is 
descnbed in the specificadon of Imemaiionai Patent Applicadon no. PCT/SG98/C0009 entided 
25 "A Neural Network Based Method for Exponent Coding in a Transform Coder for High 
Qualiry Audio " . 



Bit Allocation for Mantissas 

The bu allocation algorithm (block 21) analyses the spectral envelope of the audio signal 
30 being coded, with respect to masking effects, to determine the number of bits to assign to 
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each cransform coefficient manussa. In the encoder, the bit allocation is recommended to be 
performed giobally on the ensembie of channels as an entity, from a common bit pool. 

The bit ailocanon routine contains a parametric model (psycho-acoustic analysis block 20) of 
the human hearmg for estimating a noise level chreshoid, expressed as a function of 
frequency, which separates audible from inaudible spectral components. Various parameters 
of the heanng model can be adjusted by the encoder depending upon the signal characteristics. 
For example, a prototype masking curve is defined in terms of rwo piecewise continuous line 
segments, each with its own slope and y-intercept. 

Word-Length Requirements of Processing Blocks 

Floating point arithmetic usually uses the procedures set out in IEEE 754 (i.e. 32 bit 
representation, widi 24-bit mantissa. 7-bii exponent & 1 sign bit) which is adequate for high 
quality AC-3 encoding Work-stations like Sun SP.ARCstaiion 20 (TM) can provide much 
higher precision (e.g. double precision is 8 bytes). However floating point units require 
greater integrated circuit area and consequentiy most DSP Processors use fixed point 
arithmetic. The AC-3 encoder, in use, is often intended to be a part of a consumer product 
e.g. DVD RAM (Digital Versanle Disk Readable and Writeabie) where cost (chip area) is an 
imponant factor. 

Being aware of the cost versus quality issue in the development of the AC-3 standard. Dolby 
Laboratones has ensured ±at the algorithms can be implemented on fixed-point processors, 
however an imponant issue is what word-length is required of the fixed-point processor for 
processing high quality audio signals. 

The AC-3 encoder has been implemented on 24-bit processors such as the Motorola 56000 
and has met with much commercial success. However, although the performance of an AC-3 
encoder implemented on a 16-biL processor is universally assumed to be of low qualitv, no 
adequate srudy has been conducted to benchmark the quality or compare it with the lloating 
point version. 
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As discussed above, using doubie precision (32-btO to implement: the encoder on a 16-bii 
processor can iead to high quaJiuy (even more char, a 24-bit processor implementation). 
However, doubie precision anthmeuc is very compucationaliy expensive (e.g. on D950 single 
precision muiiiphcanon takes 1 cycle whereas double precision requires 6 cycles). 
5 Accordingly, rather than performing single or double precision throughout: the whole 
encoding process, an analysis can be perrbrmed to determine adequate precision requirements 
for each stage of computanon. 

In the description that follows, for simplicity of expression (and to avoid repetiuon). t.he 
iO following, convention has been adopted. Notation x-y (set A:set B) implies that tor the 
process, data elem.ents within Ser A are limited or truncated to x bus while the Set B elements 
are y bus long. For example, 16-32 (daiarwindow) implies that, for windowing, data was 
truncated to 16 bits and the window coefficient to 32 bits. When appearing without any 
parenthesised explanation, e.g. x-y ; explanation of the implied meaning is generally 
15 provided- If no expiananon is provided the meanmg will be clear from the context, and the 
brevity of expression has taken precedence over repetition of the same idea. 

Based on extensive simulations and study of ±e statistics denved thereon, it has been 
determined that the different stages of the encoder can be suitably implemented with different 
20 combinauons of computational precision, such as : 16-32, 32-16, 16-16 and 32-32. Suitable 
trade-off in terms of MIPS and quality are therefore made subject to the statistics obtained, 
and the computational strategies which may be adopted for various processing stages as a 
result are discussed below. 

25 Transient Detection 

In a simulation, the high-pass filtering and the subsequent segment analysis for transient 
detection was performed with 16 and 24-bu word-lengths (both single precision). The input 
PCM stream is assumed to be 16-bit and the filter coertlcients are truncated to 16 bits also 
The output of the filter is a 16-bit number which is analysed for transients. Thus, this process 

30 IS entirely 16- 16 (datarcoetficiem). 
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The simulation results are compared with the tloanng point version resulting from processmg 
with a Sun SparcStmion 20 (TM). For the simuiadon. five music samples were used, 
namely a) Dnims, b) Harp, c) Piano, d) Saxophone and e) Vocal. Although not exhauscive. 
it is believed diat Lhese are sufficient to provide a good example of complex audio streams. 
5 Figure 2 of the accompanying drawmgs is a graph of transient detection, with a comparison 
of 16-16 (data: coefficient) and 24-24 (dataxoefficient) wordlengths with the tloatins point 
results. As is evident from the chart, the 16-16 result matches over 99% of the time with the 
floating point. 

0 From Figure 2 it is evident that for more dian 99% cases, the I6-bii output from Transient 
Detection 13 (in terms of the blksw information) is same as the floating point version. This 
implies that for this stage of processing double precision computation adds little benefit, and 
16-bit single precision is adequate Simulation results tor 24-24 (data: coefficients) are also 
shown mt the Figure. 

5 

Far-ward Transform 
Windowing 

The audio block is muitipiied by a window ftincuon to reduce transform boundary effects and 
to improve frequency selecdvity in the filter bank 1 1 . The values of the window function are 

0 included in ATSC specificauon Document referred to above If the input audio is considered 
to be 16-bit Lhen for the windowing operation the data wordlength of more than 16 is 
unnecessary. For impiementaiion on a I6-bit processor the window coefficients can be 16 
or 32-bit. in general, 16-bit coefficients are inadequate and it is recommended that 32-bits 
be used for the windowing coefficients. iVIoreover, this step forms the baseline for further 

5 processing and limmng accuracy at this stage is not reasonable. However, if the input stream 
is 24-bit then 32-16 (data:coeftlcient) processmg can be performed. 



Time to Frequency Transformation 

Based on the block switch flags, each audio block is transformed into the frequency domain 
0 by pertormmg one long 572-point transform, or two short 256-point transforms. Each 
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windowed data is 32-bit long. For the frequency transformation stage, coefficient (cosine and 
sine terms) length is restricted to 16-bii. Thus using previous terminology this is 32-16 
{daia:co efficient) computation. 

5 The advantage of 32-16 precision is that the compucacion burden is not as much as the 32-32 
(pure double-precision) version. On the D950 32-16 multiplicauon takes 3 cycles while 32-32 
requires 6 cycles. 

Figure 3 illustrates a transient detection procedure which is entirely 16-bit: 16-32 (data- 
10 coefficient) bit precision is used for the windowing operation while 32-16 (daia-coefficient) 
is used for the frequency transformation. From Figure 3, note that the windowing 
coefficients axe i2-bit while the input data (CD Quaiiry) is 76-bit. The j2-bit window is 
multiplied by the 76-bit data to generate J2-bit data. This i2-bit windowed signal is 
converted to the frequency domain using the Modified Discrete Cosine Transform (MDCT). 
15 The 32-16 precision is compared with the floating point version and the 24-24 bit version in 
Table 1 , below, and the mean of die error and the standard deviation is tabulated. 
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Saxophone 


Vocal 
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e 
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€ ] a 


16-32 


0.1 


499 


0.03 


122 


0.04 


106 


0.02 


104 


0.02 1 94.3 


24-24 


0.1 


127 


0.13 


128 


0.12 


129 


0.1 


127 


0.15 1 124 



* all data has been pre-scaLed (mulnplied) by 10'\ 



Table 1. Frequency Transformation Stage : Mean (e) and Standard Deviation (a) of the error 
between floating-point and the flxed-poim (32-16 & 24-24) implementations. 

25 

It can be observed that the mean error is about 0.0000005, wherein the discrepancy is usually 
at the 20 binary place. However, since the standard deviation (a) is much larger than the 
mean {(?). it reflects more truly the behaviour of the different implementations. Given the set 
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of observations, u is often convenient to condense and summarise the data by fitting it to a 
mode! that depends on adjustable parameters (in this case the error depends on the adjustable 
word -length). Therefore it is instructive to analyse the probabiliry distribution of the error 
function. 

Figure shows rwo charts of error probabiliry for the frequency transformation stage for 32- 
16 and 24-24 fixed point computations with the floating-point version as reference. The 
probability distribuuon is based on simulation results with sample space of 40,000. From the 
Figure u can be observed thai 80% of the time 21 to 25 bit accuracy exists for the 32-16 
impiementation. For the 24-24, the same is trae for the range 18 to 21 bits. Assuming 
Gaussian distribution for the error-function (which is reasonable, looking at the probabiliry 
distribunon in the figure above), it can be stated diat for 32-16. 99.7% of the time the error 
IS less than 0.005 (3o). The low value is highly influenced by the statistics from the drums 
section of the audio input. For 24-24, with 99.7% confidence, the error is less than 0.003 
(3a). From Figure 4. it can also be noted that the spread of the error- function is less for 24- 
24 which implies a more stable performance as compared to 32-16. This figure of merit 
funcuon. though not accurate at least serves to highlight that both the implementations have 
reasonably high accuracy. 

Coupling Process 

The computational requirements for the coupling process is quite appreciable, which makes 
selection of appropriate precision more difficult. The input to the coupling process is the 
channel coefficients each of 32-bit length. The coupling progresses in several stages. For 
each such stage appropnaie word length must be determined. 

Coupling Channel Generation Strategy 

As discussed hereinabove, the coupling channel generation strategy is linked to the product 
ra,*b,, where a, and b, are the two coupled channel coetflcients within the band in question. 
Although 32-32 (double precision) computation for the dot product would lead to more 
accurate results, it is also computationally prohibitive. An important issue, however, is that 
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che output of this stage only influences how the coupling channel is generated, not the 
accuracy of the coefficients themselves. If the error trom 16-bu compuianon is not 
appreciably large, compuraiional burden can be decreased. 

5 Figure 5 is a block diagram of the coupling process 30. In the process shown in this Figure. 
16-bit (upper halt) single precision only is utilised for the coupling coefficient generation 
strategy and phase esumadon. The actual coupling ts dien performed on the full 32-bit data. 
Coupling co-ordinates may be generated also using single precision. 

10 As shown in the Figure, for phase esumanon and coupling coefficieni generation stratesy 
(31). the upper 16-bits of the full 32-bit data from the frequency transformauon stage may 
be used. The actual coupling coefficient generauon of c, = (a,±i?,)/2 (33) is performed using 
32-32 ia,:b^ precision. 

15 A similar approach of 16-16 {a,:bi) is used for the coupling co-ordinate generation (34, 35) 
However, the fmai division involved in the co-ordinate generation must preferably be done 
with highest precision possible. For this it is recommended that die floating point operation 
be emulated, that is the exponents (equivalent to number of leading zero) and mantissa 
(remaining 16 bits after removal of leading zeros). The division can then be pertormed using 

20 the best possible method as provided by the processor to provide maximum accuracy Since 
coupling co-ordinates anyway need to be converted to floating point format (exponent and 
mantissa) for final transmission, this approach has dual benefit. 
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Band 0 


1 Band I 


Band 2 \ 


Band 3 


Vocal 


98.6 1 97.8 


1 97.5 1 100 


98.6 1 99. S 1 


96. S 1 100 



Table 2. Coupling Strategy : coupling strategy for each band with the 2~-24 and the 16-16 
approach are compared (in percentage %) with the floating point version. While 2~-24 gives 
5 superior result, the 16-16 fares badly. 

Table 2, above, illustrates comparative results of coupling strategies in bands for the 
simulauon audio data, using the floating point calculanons as a rererence. The results tor 16- 
16 are not as desired. Upon analysis of tfie reason for the low performance it can be shown 
10 that usually the coupling coefficients are low value. Thus, even though the coupling 
coefficient may be represented by 32-bits the higher I6-bii5 are normally aimosi all zeros. 
Therefore simple truncation of the upper 16 bits produce poor results. A variation of the 
block exponent strategy, discussed below, can be used to improve die results. 

15 Figure 6 is a diagra.m illustrating block exponent processing, showing a pre-processing stage 
which can be implemented before truncation of the 32-bit to 16-bit for the phase esiiraaiion, 
coupling coefflcieni generation strategy and calculation of the coupling co-ordinates. In this 
procedure, the coefficients within the band (or sub-band depending on the level of processing) 
are analysed to find the minimum number of leading zeros (in acaiai implementation the 

20 maximum absolute rather than leading zeros are used for scaling). The entire coefficient set 
withm the band is then shifted (equivalent to multiplication) to the left and dien the remaining 
upper 16 bits are uuiised for die processing. Note that for the phase estimation and coupling 
strategy the multiplication factor has no affect as long as both the left and right channels 
within the band have been shifted by same number of bits. 

25 

For the coupling co-ordinate generation phase, both the coupling and the coupled channels 
should have the same multiplication factor so that they cancel out. Alternately, if floating 
point emulation is used as recommended above, the coupling and coupled channels may be 
on different scale. The difference in scale is compensated in the exponent value of the finai 
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coupiing co-ordinate. Consider, for example, a that band has only 4 bins, 96...99- 
a[96] = (0000 0000 0000 0000 1 100 0000 0000 1001) 
b[96] = (0000 0000 0000 0000 0000 0000 0000 0100) 
c [96] =(0000 0000 0000 0000 01 10 0000 0000 0110) 

a[97] =(0000 0000 0000 0000 1 100 0000 0000 0000) 
b[97] = (0000 0000 0000 0000 0001 0000 0000 1000) 
c[97]=(0000 0000 0000 0000 0110 1000 0000 0100) 

a[98] =(0000 0000 0000 0000 0000 0000 0000 1000) 
b[98] =(0000 0000 0000 0000 0000 0000 0000 1 100) 
c [98] =(0000 0000 0000 0000 0000 0000 0000 1010) 

a[99]=(0000 0000 0000 0000 1100 0000 0000 1000) 
b[99] =(0000 0000 0000 0001 0000 0000 0000 1 100) 
c[99] =(0000 0000 0000 0000 1110 0000 0000 1010) 

*Note: for xhis example c, = (a, + 6.)/2 

Considering only the upper 16-bits in this case wUl clearly lead to a poor result. For example 
coupling co-ordinate Y, = (£a*/yij^ will be zero, thereby wiping away all frequency 
components within the band for channel a when the coupiing coefficient is multiplied by the 
coupiing co-ordinate at the decoder to reproduce the coefrlcients for channel a. However, 
by removing the leading zeros, the new coefficients for channel a wil! be: 

a[96]=(00 1100 0000 0000 10) 

a[97] =(00 1 100 0000 0000 00) 

a[98J =(00 0000 0000 0000 10) 

a[99] =(00 1 100 0000 0000 10) 

on which more meaning measurements can be performed. The scaling factor will have to be 
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compensated in the exponent value for the coupling co-ordinate With this approach the 
pertbrmance of phase estimation wuh 16-16 bit processing improves drastically as illustrated 
by the results shown in Table 3. as compared to the figures in Table 2. 
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Table 3. Coupling strategy for the two implememaxion (16-16) and (24-24) as compared (in 
percentage %) to the floating point version. By use of block exponent method the accuracy 
of the 16-16 version is much improved compared to the figures in Table 2. 

Accordingiy, as shown in Figure 5, the coupiing co-ordinates may be calculated using 16-bit 
values only. The pre-processing stage of the 32-bit numbers before truncation again serves 
to im.prove results appreciably. From Table 4, below, it is evident that both die 24-24 and 
the 16-32 versions have similar performance. 
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Table 4. Mean (d) and standard deviation (a) of the error between the floating point - and 
16-16 (with block exponent) and 24-24 version. The figures are almost the same for both 
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implemeniations. 
Rematrixing 

The upper 16-bics of the 32-bit data resuiting from the frequency transformation stage may 
5 be utilised to determine rematrixjng for each band, in a manner similar to the coupling phase 
esnmauon. Within each rematrixing band, power measuremients are made for the left channel 
(L), nghc channel (R); and the channel resultmg from the sum (L-f-R) and difference (L-R). 

If the maximum power is found in the L + R or L-R signal, then the reraatrix flag is set and 
10 for diat band, and L + R and L-R are encoded instead of L and R. For the encoding process 
full 32-bii data is used to provide maximum accuracy. 

If the maximum power is in L or R, the remamxing flag is not set for that band and the 32-bii 
data moves directly to the encoding process. Table 5 below compares the 16-bit (as just 
15 described) to the floating point version. The high figures indicate that for computing the 
rematrixing flag, the above described block exponent me±od is not necessary. 
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25 

Table 5. Comparison (in perceniage 7o) of the rematnxing flag for the floating point - and 
16-16 (without block e.xponent) and 24-24 version. TJie high figures (947o - lOOTo) for the 
16-16 indicate that block e.xponent procedure is not very necessary. 
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Results 

Figures 7, 8 and 9 are frequency response charts in terms of signai-to-noise ratio for the 
three discussed implementations, namely floating point, 24-24 bit and 16-32 bit 
calculations, respectively. Tnis result is obtained by encoding-decoding 100 dB sinusoids 

5 at discrete frequency points, for the encoder version in question. The output from the 
decoder is compared v,dth the original sinusoid to estimate the SNR. Note from the graph 
thai the floating-point version gives average SNR of 85 dB (16-bit PCM has SNR of 96 
dB). The SNR measurement does not take the masking and psychoacoustic effects in 
consideration, but nevenheless gives a number with which to compare different 
10 implem.entations. The frequency response shown in Figure 8 is of the 24-24 AC-3 
encoder, which imphes that for all processing single precision arithmeiic with register 
length of 24-bit was assumed. On the other hand, the frequency response shown in Figure 
9 is of the 16-32 AC-3 encoder, which in this context imphes: 16-16 for fransient 
detection, 16-32 for windowing, 32-16 for Frequency Transformation, 16-16 for coupling 

15 (determining phase and coupling co-ordinate), 32-32 for coupling channel generation, 16- 
16 for calculation of rematrixing flag and 32-32 for the rematrixing process. 
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Claims: 

1. A method for coding digital audio data with a transform encoding process 
implemented on a fixed point digital signal processor having multiple levels of 
computation precision, wherein the transform encoding process mcludes a plurality of 
5 computation stages involving arithmetic operations in transforming the digital audio data 
into coded audio data, and wherein different ones of the computation stages utiHse 
different preselected levels of computational precision, characterised in that; 

the transform encoding process is in accordance with AC-3 Digital Audio 
Compression Standard. 

10 2. A method as claimed in claim 1, wherein the digital signal processor compnses a 
16-bit digital signal processor which is capable of single (16-brL) precision computations 
and double (32-bit) computations. 

3. A method as claimed in claim 1 or 2, wherein the plurality of computation stages 
includes transient detectionj windowing, frequency transformation, coupling strategy 

15 determination and coupling channel computation, and rematrixing determination and 
computation. 

4. A method as claimed in claim 1 or 2, wherein the transform encoding process 
includes a transient detection process for detecting transients in the audio data, and 
wherein the transient detection process is carried out with single precision computations. 

20 5. A method as claimed in claim 1 or 2, wherein the transform encoding process 
includes a windowing function which is carried out with single precision audio data and 
double precision coefficients. 

6. A method as claimed in claim 1 or 2, wherein the transform encoding process 
includes a windowing function which is carried out with double precision audio data and 
25 single precision coefficients. 
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7. A method as claimed in claim 1 or 2, wherein the transiorai encoding process 
includes a frequency transformation process which is performed with doubie precision data 
and single precision coefficients. 

8. A method as claimed in claim 1 or 2, wherein the transform encoding process 
'5 includes dsteimination of a couphng strategy and/or a phase strategy, and wherein the 

determination is performed with single precision data. 

9. A method as claimed in claim 8, wherein the determination of coupling and/or 
phase strategy incltides pre-processing by use of a block exponent method, wherein double 
precision frequency coefficients are shifted to ehminate leading zeros and truncated to 

1 0 single precision. 

10. A method as claimed in claim 8 or 9, wherein the transform encoding process 
includes the formation of a coupling channel which is performed with double precision 
data. 

11. A method as claimed in claim 1 or 2, wherein the transform encoding process 
1 5 includes a rematrixing determination which is performed with sfrigle precision data, and a 

rematrix coding process which is performed with double precision data. 

12. A digital audio transform encoder for coding digital audio data into compressed 
audio data, comprising a fixed point digital signal processor having multiple levels of 
computation precision, and transform encodtag process code stored in firmware or 

20 software for controlling the digital signal processor, wherein the transform encoding 
process code includes a plurality of computation blocks involving arithmetic operations in 
transforming the digital audio data into compressed audio data, and wherein different ones 
of the computation blocks are performed by the digital signal processor using different 
preselected levels of computational precision, characterised in that: 

25 the transform encoding process code is in accordance with AC-3 Digital Audio 

Compression Standard. 
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13. An audio transform encoder as claimed in claim 12, wherem the digiial signal 
processor comprises a 16-bit digital signal processor which is capable of single (16-bit) 
precision computations and double (32-bit) computations. 

14. An audio transform encoder as claimed in claim 12 or 13, wherein the plnrahry of 
5 computation blocks include transient detection, windowing, frequency transformation, 

coupling strategy determination and coupling channel computation, and rematrixing 
determination and computation. 

15. An audio transform encoder as claimed in claim 12 or 13, wherein the transform 
encoding process code includes a transient detection block for detecting txansienis in the 

10 audio data, and whereki the transient detection block utilises single precision 
computations. 

16. An audio transform encoder as claimed in claim 12 or 13, whereui the transform 
encoding process code include a windowing block which utihses single precision audio 
data and double precision coefficients. 

15 17. An audio transform encoder as claimed in claim 12 or 13, wherein the transform 
encoding process code includes a windowing block which utilises double precision audio 
data and single precision coefficients. 

18. An audio transform encoder as claimed in claim 12 or 13, wherein the transform 
encoding process code includes a frequency transformation block which utilises double 

20 precision data and single precision coefficients. 

19. An audio transform encoder as claimed in claim 12 or 13, wherein the transform 
encoding process code includes a block for determination of a coupling strategy and/or a 
phase strategy, and wherein the determination utilises single precision data. 

20. An audio transform encoder as claimed in claim 19, wherein the block for 
25 determmation of coupling and/or phase strategy utihses pre-processing by use of a block 

exponent method, wherein double precision frequency coefficients are shifted to eliminate 
leading zeros and truncated to single precision. 
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21. An audio transfomi encoder as claimed in claim 19 or 20, wherein the transform 
encoding process code includes a block for the formation of a coupling channel which 
utilises double precision data. 

22. An audio transform encoder as claimed in claim 12 or 13, wherein the transform 
5 encoding process code includes a rematrixing determination block which utilises single 

precision data, and a rematrix coding block which utilises double precision data. 
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